Executed: Mon Mar 27 11:39:53 2017

Duration: 5 seconds.

usALEX - Corrections - Direct excitation physical parameter

This notebook estimates direct excitation coefficient $d_T$ from μs-ALEX data.

Definitions Memo

$$n_d = I_{D_{ex}} \, \sigma_{D_{ex}}^D \, \phi_D \, \eta_{D_{det}}^{D_{em}} \, (1-E)$$$$n_a = I_{D_{ex}} \, \sigma_{D_{ex}}^D \, \phi_A \, \eta_{A_{det}}^{A_{em}} \, E$$$$ n_{aa} = I_{A_{ex}} \, \sigma_{A_{ex}}^A \, \phi_A \, \eta_{A_{det}}^{A_{em}}$$$$n_a^* = n_a + Lk + Dir$$

where

$$Lk = I_{D_{ex}} \, \sigma_{D_{ex}}^D \, \phi_D \, \eta_{A_{det}}^{D_{em}} \, (1-E)$$$$Dir = I_{D_{ex}} \, \sigma_{D_{ex}}^A \, \phi_A \, \eta_{A_{det}}^{A_{em}}$$$$\gamma = \frac{\phi_A\,\eta_{D_{det}}^{A_{em}}}{\phi_D\,\eta_{D_{det}}^{D_{em}}}$$

Aim of this notebook

What is already computed?

We previously fitted the leakage and gamma coefficient from the RAW PR values for the 5 dsDNA measurements. We also fitted the direct excitation coefficient expressed (dir_ex_aa) as a function of the A-signal during A-excitation (naa). In symbols, dir_ex_aa is defined as:

$$ d_{AA} = \frac{n_{a}}{n_{aa}}$$

for a A-only population.

What we want to compute?

Alternatively, we can express the direct excitation contribution ($Dir$) as a function of the total corrected burst size:

$$ Dir = d_T\, (n_a + \gamma n_d)$$

With this definition, expressing $d_T$ as a function of the physical parameters we obtain:

$$d_T = \frac{\sigma_{D_{ex}}^A}{\sigma_{D_{ex}}^D} $$

where $\sigma_{Dex}^A$ and $\sigma_{Dex}^D$ are the absorption cross-sections of the Acceptor and Donor dye at wavelength of Donor laser.

Finally, remembering the definition of $\beta$:

$$ \beta = \frac{I_{A_{ex}}\sigma_{A_{ex}}^A}{I_{D_{ex}}\sigma_{D_{ex}}^D}$$

we can express $d_T$ as the product of $\beta$ and $d_{AA}$:

$$ d_T = \beta \, d_{AA}$$

Note that $d_T$ is a property of the Donor-Acceptor dyes pair and of the Donor excitation wavelength. As such, differently from $d_AA$, the $d_T$ coefficient is valid for the same sample in any setup using the same donor excitation wavelength, such as the single-spot μs-ALEX and the multi-spot system. Additionally, $d_T$ allows to correct for direct acceptor excitation using only donor-excitation quantities. Therefore the same correction formula can be used both in two-laser (e.g. single-spot μs-ALEX) and single-laser systems (e.g. 8-spot system).

How we compute it?

We use two different procedures both yielding an estimation of $d_T$. Except for the numerical accuracy the two procedures are equivalent.

Procedure 1: Using beta

From the previous relation between $d_T = \beta \,d_{AA}$ is possible to directly estimate $d_T$ with the values of $\beta$ and $d_{AA}$ we already fitted in previous notebooks.

Procedure 2: Correction formula

It is possible to go from the raw $E_R$ (only background correction, no leakage, direct excitation nor gamma) to the fully-corrected $E$ using the formula:

$$ E = f(E_R,\, \gamma,\, L_k,\, d_T) = \frac{E_{R} \left(L_{k} + d_T \gamma + 1\right) - L_{k} - d_T \gamma} {E_{R} \left(L_{k} - \gamma + 1\right) - L_{k} + \gamma}$$

We can compute the corrected $E$ for the 5 dsDNA samples by fitting the fully-corrected histograms (histograms with γ, leakage and direct excitation corrections). We can also fit the 5 $E_R$ values for the same samples from the proximity ratio histograms (only background correction).

Therefore, using the previous formula we can fit $d_T$ (dir_ex_t) by minimizing the error between the 5 $E$ values fitted from corrected histograms and the 5 $E$ values obtained correcting the 5 $E_R$ values from the fit of the proximity ratio histograms.

Loading data

We load the needed libraries and FRETBursts which includes the FRET correction formulas ($E = f(E_R,\, \gamma,\, L_k,\, d_T)$).


In [1]:
import os
import numpy as np
import pandas as pd
import lmfit
from fretbursts import *
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format='retina'  # for hi-dpi displays
sns.set_style('whitegrid')


 - Optimized (cython) burst search loaded.
 - Optimized (cython) photon counting loaded.
--------------------------------------------------------------
 You are running FRETBursts (version 0.5.9).

 If you use this software please cite the following paper:

   FRETBursts: An Open Source Toolkit for Analysis of Freely-Diffusing Single-Molecule FRET
   Ingargiola et al. (2016). http://dx.doi.org/10.1371/journal.pone.0160716 

--------------------------------------------------------------

Load Raw PR


In [2]:
#bsearch_ph_sel = 'AND-gate'
bsearch_ph_sel = 'Dex'

data_file = 'results/usALEX-5samples-PR-raw-%s.csv' % bsearch_ph_sel

These are the RAW proximity ratios for the 5 samples (only background correction, no leakage nor direct excitation):


In [3]:
data_raw = pd.read_csv(data_file).set_index('sample')
data_raw[['E_gauss_w', 'E_kde_w', 'E_gauss_w_err', 'E_gauss_w_fiterr', 'n_bursts_all', 'n_bursts_fret']]


Out[3]:
E_gauss_w E_kde_w E_gauss_w_err E_gauss_w_fiterr n_bursts_all n_bursts_fret
sample
7d 0.927817 0.9338 0.002130 0.002408 1654 641
12d 0.748896 0.7508 0.002467 0.001494 1692 1157
17d 0.480670 0.4906 0.002001 0.002126 3459 2588
22d 0.280392 0.2752 0.001454 0.001775 2798 2181
27d 0.194729 0.1944 0.002032 0.001035 1146 792

Columns legend

  • E_gauss_w, center of Gaussian peak, fitted on the weighted histograms.
  • E_kde_w, maximum of the KDE computed on the weighted histograms.
  • E_gauss_w_err is $\frac{\sigma}{\sqrt{n_{FRET}}}$, where $\sigma$ is the std.dev. of the Gaussian peak and $n_{FRET}$ is the number of bursts in the FRET population.
  • E_gauss_w_fiterr is the standard error returned by the non-linear least square fit. It is computed from the Jacobian taking into account the degrees of freedom.
  • n_bursts_all, total number of bursts
  • n_bursts_fret, number of bursts in the FRET population

Load Corrected E

And these are the FRET efficiencies fitted from corrected histograms for the same 5 samples:


In [4]:
data_file = 'results/usALEX-5samples-E-corrected-all-ph.csv'
data_corr = pd.read_csv(data_file).set_index('sample')
data_corr[['E_gauss_w', 'E_kde_w', 'E_gauss_w_err', 'E_gauss_w_fiterr', 'n_bursts_all', 'n_bursts_fret']]


Out[4]:
E_gauss_w E_kde_w E_gauss_w_err E_gauss_w_fiterr n_bursts_all n_bursts_fret
sample
7d 0.924307 0.9290 0.002583 0.001782 1172 542
12d 0.727043 0.7398 0.002990 0.001995 1307 948
17d 0.422624 0.4268 0.002549 0.002008 2489 1964
22d 0.181067 0.1774 0.001888 0.001536 2054 1672
27d 0.083971 0.0826 0.002856 0.001585 791 589

Columns legend

  • E_gauss_w, center of Gaussian peak, fitted on the weighted histograms.
  • E_kde_w, maximum of the KDE computed on the weighted histograms.
  • E_gauss_w_err is $\frac{\sigma}{\sqrt{n_{FRET}}}$, where $\sigma$ is the std.dev. of the Gaussian peak and $n_{FRET}$ is the number of bursts in the FRET population.
  • E_gauss_w_fiterr is the standard error returned by the non-linear least square fit. It is computed from the Jacobian taking into account the degrees of freedom.
  • n_bursts_all, total number of bursts
  • n_bursts_fret, number of bursts in the FRET population

Load SNA Data


In [5]:
raw_data_file_sna = 'results/alix/us-ALEX SNA Results 2016-10-12.csv'

In [6]:
rsna = pd.read_csv(raw_data_file_sna, index_col=0)
rsna


Out[6]:
<E> SDV(E) Median Mode
Sample
7d 0.9125 0.0450 0.919782 0.935147
12d 0.7250 0.0675 0.728564 0.736041
17d 0.4225 0.0800 0.421083 0.418087
22d 0.1850 0.0350 0.183270 0.179754
27d 0.0800 0.0000 0.080000 0.080000

In [7]:
sna = rsna[['<E>', 'Mode']].round(4)
sna.columns = ['SNA_E_mean', 'SNA_E_max']
sna


Out[7]:
SNA_E_mean SNA_E_max
Sample
7d 0.9125 0.9351
12d 0.7250 0.7360
17d 0.4225 0.4181
22d 0.1850 0.1798
27d 0.0800 0.0800

In [8]:
data_file_sna = 'results/usALEX-5samples-E-SNA.csv'

In [9]:
sna.to_csv(data_file_sna)

Load μs-ALEX corrections


In [10]:
leakage_coeff_fname = 'results/usALEX - leakage coefficient DexDem.csv'
leakage = np.loadtxt(leakage_coeff_fname)

print('Leakage coefficient:', leakage)


Leakage coefficient: 0.10029

In [11]:
dir_ex_coeff_fname = 'results/usALEX - direct excitation coefficient dir_ex_aa.csv'
dir_ex_aa = np.loadtxt(dir_ex_coeff_fname)

print('Dir. excitation AA:', dir_ex_aa)


Dir. excitation AA: 0.06062

In [12]:
dir_ex_t_datasheet_fname = 'results/Dyes - ATT0647N-ATTO550 abs X-section ratio at 532nm.csv'
dir_ex_t_datasheet = np.loadtxt(dir_ex_t_datasheet_fname)

print('Direct excitation (dir_ex_t) from datasheet:', dir_ex_t_datasheet)


Direct excitation (dir_ex_t) from datasheet: 0.106516290727

In [13]:
gamma_coeff_fname = 'results/usALEX - gamma factor - all-ph.csv'
gamma = np.loadtxt(gamma_coeff_fname)

print('Gamma factor:', gamma)


Gamma factor: 1.020526

In [14]:
beta_coeff_fname = 'results/usALEX - beta factor - all-ph.csv'
beta = np.loadtxt(beta_coeff_fname)

print('Beta factor:', beta)


Beta factor: 0.813553

Procedure 1

Compute $d_T$ using $\beta$ and $d_{AA}$:


In [15]:
dir_ex_t_beta = dir_ex_aa * beta
'%.5f' % dir_ex_t_beta


Out[15]:
'0.04932'

In [16]:
with open('results/usALEX - direct excitation coefficient dir_ex_t beta.csv', 'w') as f:
    f.write('%.5f' % dir_ex_t_beta)

With this coefficient, computing the corrected $E$ for the 5 dsDNA samples we obtain:


In [17]:
PR_corr_kde = fretmath.correct_E_gamma_leak_dir(data_raw.E_kde_w, 
                                                leakage=leakage, 
                                                dir_ex_t=dir_ex_t_beta,
                                                gamma=gamma)*100
PR_corr_kde


Out[17]:
sample
7d     92.873302
12d    72.773077
17d    43.140290
22d    17.621842
27d     7.807774
Name: E_kde_w, dtype: float64

In [18]:
PR_corr_gauss = fretmath.correct_E_gamma_leak_dir(data_raw.E_gauss_w, 
                                                  leakage=leakage, 
                                                  dir_ex_t=dir_ex_t_beta,
                                                  gamma=gamma)*100
PR_corr_gauss


Out[18]:
sample
7d     92.225478
12d    72.560799
17d    41.983996
22d    18.247880
27d     7.848009
Name: E_gauss_w, dtype: float64

Procedure 2

Datasheet-based direct excitation

The coefficient $d_T$ can be estimated from data-sheet values of $\sigma_{D_{ex}}^A$ and $\sigma_{D_{ex}}^D$.

Using the datasheet values provided by ATTOTec (in PBS buffer) we obtain a $d_T$ estimation close to 10%:


In [19]:
dir_ex_t_datasheet


Out[19]:
array(0.106516290727)

With this the corrected $E$ for the 5 dsDNA samples are:


In [20]:
E_datasheet = fretmath.correct_E_gamma_leak_dir(data_raw.E_kde_w, 
                                                leakage=leakage, 
                                                dir_ex_t=dir_ex_t_datasheet,
                                                gamma=gamma)*100
E_datasheet


Out[20]:
sample
7d     92.484823
12d    71.288927
17d    40.040846
22d    13.131377
27d     2.782340
Name: E_kde_w, dtype: float64

Comparing these values with the ones obtained fitting the corrected E histograms we observe a significant discrepancy:


In [21]:
out = data_corr[['E_kde_w']].copy()*100
out.columns = ['E_alex']
out['E_datasheet'] = E_datasheet
out


Out[21]:
E_alex E_datasheet
sample
7d 92.90 92.484823
12d 73.98 71.288927
17d 42.68 40.040846
22d 17.74 13.131377
27d 8.26 2.782340

In [22]:
out.plot(alpha=0.4, lw=3, style=dict(E_alex='-ob', E_datasheet='-sr'));


NOTE: The corrected FRET efficiencies using the datasheet and μs-ALEX-based direct excitation do not match well.

Fitting direct excitation $d_T$


In [23]:
def residuals_absolute(params, E_raw, E_ref):
    dir_ex_t = params['dir_ex_t'].value
    return E_ref - fretmath.correct_E_gamma_leak_dir(E_raw, 
                                                     leakage=leakage, 
                                                     gamma=gamma, 
                                                     dir_ex_t=dir_ex_t)

In [24]:
def residuals_relative(params, E_raw, E_ref):
    dir_ex_t = params['dir_ex_t'].value
    return (E_ref - fretmath.correct_E_gamma_leak_dir(E_raw, 
                                                      leakage=leakage, 
                                                      gamma=gamma, 
                                                      dir_ex_t=dir_ex_t))/E_ref

In [25]:
params = lmfit.Parameters()
params.add('dir_ex_t', value=0.05)

In [26]:
m = lmfit.minimize(residuals_absolute, params, args=(data_raw.E_kde_w, data_corr.E_kde_w))
lmfit.report_fit(m.params, show_correl=False)


[[Variables]]
    dir_ex_t:   0.04614939 +/- 0.004939 (10.70%) (init= 0.05)

In [27]:
m = lmfit.minimize(residuals_relative, params, args=(data_raw.E_kde_w, data_corr.E_kde_w))
lmfit.report_fit(m.params, show_correl=False)


[[Variables]]
    dir_ex_t:   0.04482660 +/- 0.001150 (2.57%) (init= 0.05)

NOTE: The fitted dir_ex_t is 4.5% as opposed to 10.6% as expected from the absorption spectra of ATTO550 and ATTO647 at 532nm.


In [28]:
'%.5f' % m.params['dir_ex_t'].value


Out[28]:
'0.04483'

In [29]:
with open('results/usALEX - direct excitation coefficient dir_ex_t fit.csv', 'w') as f:
    f.write('%.5f' % m.params['dir_ex_t'].value)

In [30]:
PR_corr_kde_dfit = fretmath.correct_E_gamma_leak_dir(data_raw.E_kde_w, 
                                                     leakage=leakage, 
                                                     dir_ex_t=m.params['dir_ex_t'].value,
                                                     gamma=gamma)*100
PR_corr_kde_dfit.name = 'PR_corr_kde_dfit'
PR_corr_kde_dfit


Out[30]:
sample
7d     92.903804
12d    72.889605
17d    43.383644
22d    17.974412
27d     8.202347
Name: PR_corr_kde_dfit, dtype: float64

Corrected E


In [31]:
E = pd.concat([data_corr[['E_kde_w', 'E_gauss_w']]*100, PR_corr_kde, PR_corr_gauss, sna*100], axis=1)
E.columns = ['E KDE', 'E Gauss', 'PR KDE', 'PR Gauss', 'SNA E mean', 'SNA E max']
E


Out[31]:
E KDE E Gauss PR KDE PR Gauss SNA E mean SNA E max
7d 92.90 92.4307 92.873302 92.225478 91.25 93.51
12d 73.98 72.7043 72.773077 72.560799 72.50 73.60
17d 42.68 42.2624 43.140290 41.983996 42.25 41.81
22d 17.74 18.1067 17.621842 18.247880 18.50 17.98
27d 8.26 8.3971 7.807774 7.848009 8.00 8.00

In [32]:
E.plot.bar(table=np.round(E, 2).T)
plt.ylabel('FRET (%)')
plt.gca().xaxis.set_visible(False)
#plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);


Estimated errors


In [33]:
Eerr = (pd.DataFrame().assign(
            E_gauss_w_err = data_corr.E_gauss_w_err, E_gauss_w_fiterr = data_corr.E_gauss_w_fiterr,
            PR_gauss_w_err = data_raw.E_gauss_w_err, PR_gauss_w_fiterr = data_raw.E_gauss_w_fiterr,
            SNA_SDV=rsna['SDV(E)']))*100
Eerr


Out[33]:
E_gauss_w_err E_gauss_w_fiterr PR_gauss_w_err PR_gauss_w_fiterr SNA_SDV
sample
7d 0.2583 0.1782 0.2130 0.2408 4.50
12d 0.2990 0.1995 0.2467 0.1494 6.75
17d 0.2549 0.2008 0.2001 0.2126 8.00
22d 0.1888 0.1536 0.1454 0.1775 3.50
27d 0.2856 0.1585 0.2032 0.1035 0.00

In [34]:
nbursts = data_corr[['n_bursts_all', 'n_bursts_fret']]
nbursts


Out[34]:
n_bursts_all n_bursts_fret
sample
7d 1172 542
12d 1307 948
17d 2489 1964
22d 2054 1672
27d 791 589

Note on standard errors

The standard error for the Gausian peak fit is reported both as the error returned from the fit routine (*gauss_w_fiterr) and as $\frac{\sigma}{\sqrt{n_{FRET}}}$ (*gauss_w_err, see Columns legend above for more details). The columns starting with E_ are from fit of the corrected FRET histograms. The columns starting with PR_ are from fit of the PR histograms. In principle, the PR errors should be propagated through the correction formula, but this prpagation is not reported here (error is small).

The KDE method does not naturally provide a standard error. In principle the error can be computed using the bootstrap method, which is computationally intensive and is not reported here.

The SNA method returns a distribution of FRET efficiencies that, when shot-noise is added, best fits the experimental histogram. We report estimates of E using the max or the mean of the E distribution. We report the error as the standard deviation of the E distribution. Therefore this value is common to both E estimates (max or mean).

Comparison of different methods

A more empirical way to compute the error is comparing results of the different methods, as done below.


In [35]:
E[['PR KDE', 'PR Gauss', 'E KDE']].plot(kind='bar')
E[['PR KDE', 'PR Gauss', 'E KDE']].plot(lw=3);
print('Max error E_alex vs E_corr_pr: %.2f' % (E['E KDE'] - E['PR KDE']).abs().max())
print('Max error E_alex vs E_beta:    %.2f' % (E['E KDE'] - E['PR Gauss']).abs().max())
print('Max error E_beta vs E_corr_pr: %.2f' % (E['PR Gauss'] - E['PR KDE']).abs().max())


Max error E_alex vs E_corr_pr: 1.21
Max error E_alex vs E_beta:    1.42
Max error E_beta vs E_corr_pr: 1.16

In [36]:
x = [int(idx[:-1]) for idx in out.index]
plt.plot(x, 'E KDE', data=E)
plt.plot(x, 'PR KDE', data=E)
plt.plot(x, 'PR Gauss', data=E)
plt.xlabel('Distance in base-pair')
plt.ylabel('FRET');



In [37]:
E['E KDE'] - E['PR KDE']


Out[37]:
7d     0.026698
12d    1.206923
17d   -0.460290
22d    0.118158
27d    0.452226
dtype: float64

NOTE: Fitting $d_T$ to match $E$ from corrected histograms with $E$ from PR correction formula produces a max difference of 1% for the 12d sample. The match is well below the fitting accuracy (> 2%).

Save


In [38]:
E.to_csv('results/usALEX-5samples-E-all-methods.csv', float_format='%.3f')

In [39]:
E.round(3)


Out[39]:
E KDE E Gauss PR KDE PR Gauss SNA E mean SNA E max
7d 92.90 92.431 92.873 92.225 91.25 93.51
12d 73.98 72.704 72.773 72.561 72.50 73.60
17d 42.68 42.262 43.140 41.984 42.25 41.81
22d 17.74 18.107 17.622 18.248 18.50 17.98
27d 8.26 8.397 7.808 7.848 8.00 8.00

In [40]:
Eerr.to_csv('results/usALEX-5samples-E-all-methods_errors.csv', float_format='%.4f')